Skip to main content

All Questions

1vote
0answers
57views

Can I reduce computation by only predicting response tokens in a transformer and still get the same gradients?

I have been looking at the source code of the Stanford Alpaca model and I believe that during inference, the whole instruction + response data is fed into the model normally. Then the instruction part ...
Tianchen Zheng's user avatar

close